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Abstract 

This  paper  summarizes  a  collaborative,  six 
month  ARDA  NRRC1  challenge  workshop  to 
characterize  and  create  analysis  methods  to 
counter  sophisticated  malicious  insiders  in  the 
United  States  Intelligence  Community.  Based 
upon  a  careful  study  of  past  and  projected  cases, 
we  report  a  generic  model  of  malicious  insider 
behaviors,  distinguishing  motives,  (cyber  and 
physical)  actions,  and  associated  observables. 
The  paper  outlines  several  prototype  techniques 
developed  to  provide  early  warning  of  insider 
activity,  including  novel  algorithms  for  struc¬ 
tured  analysis  and  data  fusion.  We  report  the  as¬ 
sessment  of  their  performance  in  an  operational 
network  against  three  distinct  classes  of  human 


1  This  effort  was  performed  at  The  MITRE  Corporation  at  the 
Northeast  Regional  Research  Center  (NRRC)  which  is  spon¬ 
sored  by  the  Advanced  Research  and  Development  Activity  in 
Information  Technology  (ARDA),  a  U.S.  Government  entity 
which  sponsors  and  promotes  research  of  import  to  the  Intel¬ 
ligence  Community  which  includes  but  is  not  limited  to  the 
CIA,  DIA,  NS  A,  NGA,  and  NRO. 


insiders  (an  analyst,  application  administrator, 
and  system  administrator),  measuring  timeliness 
and  accuracy  of  detection. 

1.  The  Threat:  Malicious  Insiders 

An  insider  as  anyone  in  an  organization  with  approved 
access,  privilege,  or  knowledge  of  information  systems, 
information  services,  and  missions.  A  malicious  insider 
(MI)  is  one  motivated  to  adversely  impact  an  organiza¬ 
tion’s  mission  through  a  range  of  actions  that  compro¬ 
mise  information  confidentiality,  integrity,  and/or  avail¬ 
ability.  This  research  explores  three  fundamental  hy¬ 
potheses  motivated  by  our  study  of  Mis. 

1 .  While  some  Mis  can  be  detected  using  a  single  cyber 
observable,  other  Mis  could  be  detected  only  by  using 
multiple  and  heterogeneous  observables. 

2.  Fusing  information  from  heterogeneous  information 
sources  (e.g.,  logs  from  printers,  authentication,  card 
readers,  telephone  calls)  and  various  levels  of  the  IP 
stack  (e.g.,  application  vs.  network  traffic)  allows  more 
accurate  and  timely  indications  and  warning  of  malicious 
insiders. 
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3.  Observables  together  with  domain  knowledge  (e.g., 
user  role,  asset  value  to  mission)  can  help  detect  inappro¬ 
priate  behavior  (e.g.,  need  to  know  violations). 

To  maximize  progress  in  this  challenge  workshop,  we 
created  multiple  working  groups:  one  responsible  for  our 
experimentation  data  and  network,  one  using  Stealth- 
Watch  sensors  (which  perform  traffic  and  host  profiling), 
another  using  honeynets,  another  using  structured  analy¬ 
sis  models,  and  another  using  bottom  up  fusion  across 
multiple  sensors  to  detect  insiders. 

2.  Historical  MI  Case  Analysis 

The  first  step  in  our  approach  was  an  analysis  of  prior 
malicious  insiders.  While  we  investigated  information  on 
dozens  of  insider  cases  (DSS  1999,  Herbi  and  Wiskoff 
2002),  we  performed  detailed  analysis  on  six  cases. 
Maybury  et  al.  (2004)  summarizes  some  key  features  of 
three  representative  cases  such  as  CIA’s  Aldrich  “Rick” 
Ames,  FBI’s  Robert  Philip  Hanssen  (2003),  and  DIA’s 
Ana  Belen  Montes  (2001).  In  each  of  these  cases  we 
summarize  their  position,  motive,  foreign  handlers,  im¬ 
pact,  sentence,  computer  skill,  polygraph  experience, 
cyber  security  violations,  counter  intelligence  activities, 
physical  and  cyber  access,  cyber  extraction  and  exfiltra¬ 
tion,  cyber  communication,  and  the  transfer  of  materials 
to  foreign  handlers. 

The  devastating  impact  of  these  three  individuals  in¬ 
cluded  the  violation  of  confidentiality,  undermining  of 
intelligence  integrity,  adverse  influence  of  US  policy,  the 
revelation  of  sources  and  methods,  and  the  death  and 
compromise  of  field  agents.  Motives  were  diverse,  rang¬ 
ing  from  financial  to  thrill  to  ideological.  In  each  of 
these  cases,  handlers  were  professional  foreign  service 
agents.  Two  of  the  three  passed  polygraphs.  While  the 
computer  skills  of  each  of  these  insiders  ranged  signifi¬ 
cantly,  all  left  trails  of  suspicious  cyberactivity  while 
performing  cyber  access,  exfiltration,  and/or  communica¬ 
tion.  All  engaged  in  counter  intelligence  to  evade  detec¬ 
tion  and/or  destroy  incriminating  evidence.  In  each  case 
we  found  opportunities  to  observe  individual  incidents 
and/or  to  detect  anomalous  behavior  from  correlated  ob¬ 
servables. 

In  addition  to  these  historic  cases,  we  also  projected  a 
future  insider  in  the  role  of  a  systems  or  network  admin¬ 
istrator  who  would  have  significantly  deeper  computing 
skill  and  infrastructure  access.  This  would  enable,  for 
example,  more  stealthy  attacks  (e.g.,  the  MI  might  not 
have  to  perform  network  reconnaissance  or  could  create 
private  communication  channels  or  open  up  backdoors) 
as  well  as  new  kinds  of  attacks  such  as  on  availability 
wherein  the  objective  of  the  insider  was  to  degrade,  dam¬ 
age  or  destroy  the  network. 


3  Simulated  Mis:  Pal,  Jill,  and  Jack 

Grounding  our  efforts  in  realistic  insider  behavior,  we 
explored  detecting  three  types  of  insiders  in  detail  in  this 
activity.  The  first  was  a  historical  insider  modeled  as  a 
prototype  of  past  need-to-know  violators.  We  call  this 
insider  Pal.  A  second  insider,  named  Jack,  was  a  pro¬ 
jected  insider  who  would  aim  to  disrupt,  damage,  or  de¬ 
stroy  the  network  or  elements  thereof.  In  the  course  of 
defining  and  simulating  these  insiders,  the  scenario  team 
implemented  a  third  category  of  insider,  an  application 
administrator,  called  News  Admin  or  Jill.  Only  Pal’s 
behavior  model  was  disclosed  to  sensor  builders  prior  to 
the  experiment.  For  detail  about  these  insiders  including 
a  log  of  specific  actions  taken  by  the  insiders  see 
Maybury  et  al.  (2004).  The  three  malicious  insider  cases 
were  simulated  on  MITRE’s  Demilitarized  Zone  (DMZ) 
network.  The  DMZ  consists  of  over  300  hosts  with  a 
range  of  missions  utilizing  services  such  as  web  (HTTP), 
news  (NNTP),  file  transfer  (FTP),  messaging  (SMTP), 
mail  (POP,  IMAP),  database  (SQL),  and  question  an¬ 
swering.  We  instrumented  18  of  31  nodes  on  the  NRRC 
(Northeast  Regional  Research  Center)  subnetwork  which 
had  75  on-line,  active  users  during  the  evaluation. 

A  semi-automated  process  captured,  filtered,  and  ano¬ 
nymized  the  malicious  insider  collection  to  address  secu¬ 
rity  and  privacy  concerns.  Figure  1  illustrates  the  het¬ 
erogeneous  nature  of  the  collection  consisting  of  over  1 1 
million  records  which  spans  physical  sensors  (e.g.,  em¬ 
ployee  badge  readers),  network  level  sensors  (e.g.,  Snort 
rules  modified  to  detect  inappropriate  connections  or  be¬ 
havior),  host  sensors  (to  detect  user  access  and  command 
sequences),  and  applications  (e.g.,  mail  server  logs,  web 
server  logs,  network  news  logs).  A  Common  Data  Re¬ 
pository  (CDR)  was  established  as  a  central  database 
storing  the  over  11  million  anonymized,  time  stamped 
audit-log  records  collected  over  three  months. 


Application 


Host 
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Physical 

Figure  1.  Heterogeneous  and 
Multilevel  Data  Sources 

4.  Event  and  Observable  Taxonomy 

In  order  to  access,  exploit,  or  damage  assets,  a  MI  will  nec¬ 
essarily  need  to  perform  (or  have  another  person  or  process 
perform)  a  series  of  actions  to  gain  privileges,  access  or 


manipulate  assets.  Derived  from  our  analysis  of  MI  cases, 
Figure  2  shows  a  taxonomy  of  cyber  events  which  have  as¬ 
sociated  observables  that  hold  promise  for  the  foundation  of 
a  detection  system.  The  taxonomy  distinguishes  observ¬ 
ables  in  the  cyber  domain  from  those  in  the  physical  do¬ 
main.  The  taxonomy  includes  observables  such  as  results 
of  the  polygraph,  records  of  security  violations,  missing  or 
misleading  reports  on  finances,  foreign  travel  or  foreign 
contacts,  physical  facility  access,  personal  finances,  materi¬ 
als  transfer,  counter  intelligence,  social  behavior,  and  com¬ 
munications.  In  this  research  we  focused  exclusively  on 
cyber  observables,  including  other  observables  that  could  be 
readily  converted  to  a  cyber  signal  (e.g.,  digitized  facility 
access  logs). 


heterogeneous  sensors  provide  input  to  a  Common  Data 
Repository  (CDR)  from  which  a  range  of  analyses  are  per¬ 
formed  including  data  fusion  and  structural  analysis  to  iden¬ 
tify  potential  suspects  on  a  watch  list  or  issue  an  alert  of  an 
insider  threat.  As  illustrated  in  Figure  3,  our  technical  ap¬ 
proach  is  novel  in  the  following  respects: 

•  A  Common  Data  Repository  (CDR)  captures  and  ano¬ 
nymizes  heterogeneous  sensor  input. 

•  Multilevel  monitoring  occurs  at  the  packet  level,  system 
level,  and  application  level. 

•  StealthWatch  sensors  detect  abnormal  insider  behavior 
on  the  network  such  as  scanning,  file  transfer,  or  inter¬ 
nal  network  connections. 

•  Distributed  honeynets  acquire  attacker  properties,  pre¬ 
attack  intensions,  and  potential  attack  strategies. 

•  A  real-time,  top-down  structural  analysis  drawing  upon 
functional  models  of  Mis  maps  pre-attack  indicators  to 
models  of  potential  Mis. 

•  Traditional  and  non- traditional  indicators  (e.g.,  logs  of 
network  activity,  physical  access,  PBX,  help  desks), 
including  non-digital  sources,  are  fused  bottom-up. 

Sensor  inputs  are  then  exploited  by  a  decision  analysis 
component  to  determine  watch  list  membership  and  insider 
detection.  We  next  consider  each  of  the  primary  detection 
strategies. 


Figure  2.  Cyber  Event/Observable  Taxonomy 

The  core  of  the  taxonomy  incorporates  a  range  of  cyber  ob¬ 
servables  encompassing  a  range  of  classes  of  cyber  actions 
indicated  in  bold  italics  in  Figure  2.  These  include  activities 
of  network,  system,  and  information  reconnaissance,  access 
to  assets  (e.g.,  media,  hosts,  accounts),  entrenchment  (e.g., 
installing  sensors  or  unauthorized  software),  exploitation 
(e.g.,  commanding  and  controlling  entrenched  assets  such  as 
software  bots  or  zombie  machines),  extraction  and  exfiltra¬ 
tion  (e.g.,  of  hardcopy,  media,  information),  communication 
(e.g.,  encrypted  messaging,  encoded  messages,  covert  chan¬ 
nels),  manipulation  of  cyber  assets  (e.g.,  changing  file  per¬ 
missions,  suppressing  or  altering  information  content), 
counter  intelligence  (e.g.,  wiping  disks),  and  other  cyber 
activities  associated  with  unethical  or  addictive  behavior 
(e.g.,  on  line  gambling).  Some  observables  have  been  used 
in  some  historical  cases  as  a  tip-off  of  malicious  activity; 
others  serve  as  direct  indicators  of  inappropriate  behavior. 

5.  Insider  Detection 

While  the  live  network  instrumentation  describe  in  Section 
3  provided  an  unprecedented  and  essential  set  of  MI  ex¬ 
perimental  data,  the  thrust  of  our  activity  was  developing 
novel  algorithms  to  detect  Mis.  Figure  3  illustrates  the  high 
level  architecture  of  a  proof  of  concept  system  that  was  de¬ 
signed,  implemented,  and  tested  to  detect  Mis.  Distributed, 


COMMON  DATA 

-  Authentication,  Mail,  DMZ 
Servers,  IDS,  Honeynet,  BadgeData 

-  Application  Logs  (e.g.,  web,  DB,  mail) 

-  Nessus  Scans  (vulnerability  analysis) 

-  Switch  logs,  Stealth  Watch  logs 


Figure  3.  Integrated  Architecture  for  In¬ 
sider  Detection  System 

5.1  HoneyTokens 

Honeypots  are  realistic  but  dummy  systems  that  reflect  true 
production  systems  and  are  designed  to  attract  malicious 
users  to  inappropriately  access  resources.  Combined  with 
subtlely  advertised  enticements  to  potential  insider  threats, 
honeypots  provide  a  mechanism  to  determine  what  moti¬ 
vates  the  inside  attacker  and  what  capabilities  the  attacker 
possesses. 
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A  novel  idea  developed  during  the  workshop  and  applied 
in  the  insider  detection  process  is  the  notion  of  a  honey- 
token.  A  honeytoken  is  a  semi-valuable  piece  of  infor¬ 
mation  whose  use  can  be  readily  tracked.  This  could  be  a 
credit  card  number,  an  Excel  spread  sheet,  a  database 
entry,  or  a  login  and  password.  A  honeytoken  is  an  entity 
that  has  no  authorized  use.  Honeytokens  can  be  used  for 
the  initial  detection  of  insider  threats,  then  those  threats 
can  be  redirected  to  honeynets  to  confirm  if  a  violation 
has  occurred,  potentially  learning  more  about  the  threat. 

In  the  Pal  scenario  described  in  Section  3,  the  honeyto¬ 
ken  takes  the  form  of  a  web  page  which  lists  (fictitious) 
operatives  in  the  geographic  region  of  interest  to  the  MI. 
The  data  fusion  group,  detailed  in  a  subsequent  section, 
exploits  the  detection  of  honeytoken  access  as  one  of  a 
range  of  indicators  of  malicious  behavior.  In  other  ex¬ 
amples  of  honeytokens  (e.g.,  login  password)  it  is  possi¬ 
ble  to  use  the  false  information  to  track  activities  (e.g.,  in 
a  controlled  account)  to  more  readily  discover  MI  ac¬ 
tions,  capabilities,  and  intentions. 

5.2  Stealthwatch 

One  of  the  sensor  types  deployed  in  support  of  the  insider 
threat  workshop  was  StealthWatch.  StealthWatch 
(http://www.lancope.com)  provides  traffic  profiling  (e.g., 
data  flow  analysis)  and  host  profiling  such  as  zone  profil¬ 
ing  (e.g.,  monitoring  which  hosts  can  talk  to  which). 
StealthWatch  creates  a  concern  index  (Cl)  based  on  flow 
analysis,  reconnaissance  activity,  and  anomaly  detection. 
Any  host  reaching  an  accumulated  Cl  level  above  20,000 
generates  an  alarm.  Stealthwatch  was  deployed  to  moni¬ 
tor  scanning,  downloads,  and  inside  connections. 
StealthWatch  enabled  an  analyst  to  identify  Jack  within  a 
day  of  his  malicious  activity,  which  initiated  on  February 
11th.  However,  StealthWatch  did  not  detect  Pal  and  Jill 
who  exihibited  no  scanning,  download  or  insider  connec¬ 
tion  anomalies. 

5.3  Structured  Analysis 

Using  the  cyber  observable  framework  introduced  at  the 
beginning  of  this  article  in  Figure  2,  the  structured  analy¬ 
sis  approach  considered  a  range  of  cyber  observables 
associated  with  adversaries  to  perform  top-down,  real¬ 
time,  model-based  detection  of  Mis.  Knowledge  of  net¬ 
work  mission  and  network  configuration  together  with 
likely  adversarial  actions  were  used  to  filter  and  highlight 
adversary  behavior.  MI  actions  modeled  included  recon¬ 
naissance  (e.g.,  via  web  browsing  or  net  scans),  access 
(e.g.,  privilege  escalation),  entrenchment  (e.g.,  via  de¬ 
ployment  of  sensors),  extraction  and  exfiltration  (e.g., 
unauthorized  downloads),  and  communications  (e.g., 
coded  messages  or  covert  channels).  A  range  of  sensors 
and  data  logs  are  relevant  to  detecting  many  of  the  cyber 
actions.  For  example  email  communication  patterns 
would  provide  insight  into  the  social  network  of  a  mali¬ 
cious  insider.  Conversely,  large  downloads  might  signal 
data  exfiltration. 


The  structural  analysis  group  (SAG)  modeled  two  insid¬ 
ers,  Pal  and  Jack,  considering  temporal  characteristics  of 
protocols  such  as  event  proximity  (e.g.,  immediate  vs. 
days  vs.  years)  and  observable  ordering.  The  Pal  detector 
exhibited  3%  false  positives  and  no  false  negatives  and 
the  Jack  detector  had  1%  false  positives  and  50%  false 
negatives. 

The  structured  analysis  approach  detected  all  three  simu¬ 
lated  Mis,  Pal,  Jill,  and  Jack,  at  various  times.  Pal  was 
put  on  the  watch  on  December  11th,  2003,  two  days  after 
Pal  initiated  behavior.  Six  days  later  an  alert  was  issued 
for  Pal.  The  News  Administrator  Jill  was  put  on  the 
watch  list  on  January  24th,  2004,  even  before  her  suspi¬ 
cious  behavior  started  on  February  11th.  Four  days  later, 
on  February  15th,  SAG  issued  an  alert  for  Jill.  Finally, 
SAG  issued  an  alert  on  17  February  for  Jack,  six  days 
after  he  initiated  malicious  administrator  behavior. 

5.4  Data  Fusion 

In  contrast  to  the  preceding  model-based,  top-down 
analysis  of  observables,  a  data  fusion  approach  explored 
bottom-up  combination  of  observables  from  multiple 
sensors  including  the  combination,  correlation  and  analy¬ 
sis  of  both  traditional  and  novel  indicators  stored  in  the 
Common  Data  Repository.  This  low-level  data  came 
from  available  resources  on  the  MITRE  DMZ  network 
along  with  inputs  from  other  teams  (StealthWatch  and 
honeynets).  Thus  data  fusion  occurred  over  input  from 
the  physical,  network,  host  and  application  levels.  Fu¬ 
sion  can  occur  along  a  number  of  dimensions  such  as  the 
type  of  sensor  (e.g.,  card  reader,  authentication,  printer, 
telephone  calls)  and  the  level  of  IP  stack  (e.g.,  from  net¬ 
work  to  application). 

A  proof-of-concept  data  fusion  engine  was  developed  that: 

•  Fused  multiple  indicators  that  were  related  by  IP,  user 
name,  etc.  into  a  single  indicator 

•  Allowed  customer  selectable  weights  for  each  element 
in  the  observable  taxonomy 

•  Allowed  site  specific  criteria  for  generating  an  alert  and 
for  submission  of  names  to  a  watch  list  based  on: 

o  Accumulated  indicator  weight 
o  Accumulated  indicator  breadth,  i.e.,  the  number  of 
unique  sources  of  evidence 

In  addition,  a  novel  sensor  was  developed  that  performed 
email  consistency  checking.  The  email  sensor  analyzed 
PGP  signatures  and  PGP-encrypted  attachments  and  gen¬ 
erated  error  messages  to  report  if  the  PGP  data  did  not 
decode  properly.  A  simple,  flexible  data  fusion  engine 
was  implemented  in  the  JESS  rule  system. 

Figure  4  illustrates  a  series  of  logged  messages  from  the 
operation  of  the  data  fusion  engine  that  report  various 


inferences  being  made  on  the  basis  of  sensor  input  for  the 
scenarios  on  the  DMZ  network  introduced  in  Section  3. 

The  data  fusion  engine  was  able  to  find  indications  of  mali¬ 
cious  activity  in  two  out  of  the  three  scenarios,  with  little 
tuning.  One  in-the-wild  incident  was  discovered,  in  particu¬ 
lar  a  user  who  was  excessively  downloading  news.  Perhaps 
equally  significant  is  the  substantial  data  reduction  that  re¬ 
sulted.  The  data  fusion  engine  needed  to  examine  7.4  mil¬ 
lion  records  to  analyze  the  activity  of  the  75  users  on  the 
system.  259  indicators  were  generated  for  24  of  those  users. 
The  data  fusion  approach  exhibited  very  low  false  positive 
rates  (0  or  3%)  with  moderate  to  high  false  negative  rates 
(33  to  66%)  using  a  threshold  of  the  maximum  change  be¬ 
tween  any  two  detected  users. 


Cyber-Access,  user324,  weight  1,  at  2003-12-10  1 1:14:38,  from 
news.mitre.org 

su  to  user9676  failed  for  non-admin  user  user324  on  /dev/pts/0 


Physical-Access,  user295,  weight  5,  at  2003-12-15  19:19:37, 

After  hours  badge  access  for  user295 


Cyber-Extraction-Exfiltration,  user2649,  weight  5,  at  2004-01-06 
15:37:28,  from  nrrc-springfield.mitre.org, 

Data  was  uploaded  to  an  external  server  via  FTP  protocol 


Cyber-Reconnaissance ,  user295,  weight  10,  at  2004-01-09  20:57:18,  from 
nrrc  -  Springfield .  mitre .  org , 

User  user295  searching  in  non-need-to-know  country  korea 


Cyber-Communication,  user9,  weight  15,  at  2004-02-10  22:14:48,  from 
cvw.mitre.org, 

User  user9  received  email  with  masqueraded  content  from 
userll649@vahoo.com 


Cyber-Reconnaissance,  userl,  weight  5,  at  2004-02-10  13:54:15,  from 
nrrc-plymouth.mitre.org, 

Ongoing  Cl  violation  --  066.170.227.074  currently  has  49613  alerts 
of  this  type... 


Cyber-Extraction-Exfiltration,  user295,  weight  8,  at  2004-02-12  23:54:58, 
from  dmzsrvl  .mitre.org, 

User  user295  sent  encrypted  email  to  user9983@comcast.net 


Cyber-Extraction-Exfiltration,  userl,  weight  15,  at  2004-02-20  12:25:03, 
from  nrrc-erie.mitre.org, 

userl  sent  email  with  masqueraded  content  to  userl@mitre.org 

Figure  4.  Data  Fusion  Log 


6.  Evaluation:  Accuracy  and  Speed 

We  designed  evaluation  methods  to  assess  both  the 
accuracy  and  timeliness  of  insider  threat  detection.  Accu¬ 
racy  was  measured  in  terms  of  false  positives  and  false 
negatives  ad  defined: 

•  False  positives  =  #  users  incorrectly  detected  as  mali¬ 
cious  insiders 

•  False  negatives  =  total  #  actual  malicious  insiders  -  # 
correctly  detected 


For  each  sensor,  the  maximum  change  between  any  two 
detected  users  is  computed,  and  then  false  positive  and  false 
negative  measures  are  computed  at  several  points  along  the 
graph.  Detection  results  were  measured  at  three  weighted 
thresholds:  maximum  change,  50%  of  the  highest  factor, 
and  10%  of  the  highest  factor. 

In  contrast  to  accuracy,  timeliness  were  measured  as  the 
difference  between  the  time  when  an  insider’s  malicious 
activity  begins,  the  time  at  which  they  are  put  on  a  watch 
list,  and  the  time  at  which  an  insider  threat  alarm  occurs. 

We  consider  these  measures  stricter  than  would  be  found  in 
an  operational  environment  which  would  include  a  human 
analyst  in  the  loop  to  make  alert  decisions  (Stealthwatch 
results  did  include  a  human  analyst,  whereas  the  other 
methods  were  autonomous). 
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StealthWatch 

Structured  Analysis 
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Figure  5.  Timeliness  of  Detection  Results 


Figure  5  summarizes  the  timeliness  performance  of  the 
StealthWatch,  structured  analysis,  and  data  fusion 
detectors  for  the  three  insiders:  Pal,  Jill  and  Jack.  One 
objective  was  to  reduce  the  time  from  defection  to  the 
time  of  detection  from  years  to  months  to  weeks  to  days 
if  not  minutes.  In  Figure  5,  the  eye  icon  indicates  the  day 
when  the  sensor  put  the  insider  on  the  watch  list.  The  bell 
indicates  the  first  day  when  an  alert  is  issued.  The  black 
vectors  indicate  the  start  (left  of  the  vector)  and  stop 
(right  of  the  arrow  head)  times  of  the  malicious  behavior 
by  the  three  human  Mis.  The  bold  font  dates  are 
associated  with  the  StealthWatch  sensor  (which  only 
detected  Jack),  the  underlined  dates  are  for  the  structured 
analysis  method,  and  the  italicized  ones  are  for  the  data 
fusion  performance.  In  summary,  when  reviewed  across 
all  sensors  and  methods,  of  our  three  Mis,  two  were 
detected  within  one  week  of  their  initiation  of  suspicious 
activity  and  the  third  was  detected  within  two  weeks. 
Even  removing  some  more  obvious  indicators  such  as  the 
scanning  behavior  of  Jack,  because  a  multiplicity  of 
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sensors  provide  evidence  for  inferences  Jack  would  still 
be  detected. 

7.  Summary 

Malicious  insiders  pose  perhaps  the  most  serious  threat  to 
organizational  cyber  assets.  Malicious  insider  behavior  is 
distinct  from  that  of  classical  external  intruders  and  cannot 
be  detected  using  traditional  intrusion  detection  methods.  In 
this  article,  we  report  results  from  a  challenge  workshop  that 
demonstrated  how  an  integration  of  multiple  approaches 
promises  early  and  effective  warning  and  detection  for  a 
range  of  insider  threats.  The  primary  contributions  of  this 
work  include: 

•  A  taxonomy  of  cyber  assets  and  cyber  actions  associ¬ 
ated  with  known  malicious  insider  behavior 

•  An  attribute-based  model  of  known  insiders  correlated 
with  cyber  indicators  -  classification  of  classic  insider 
classes  (e.g.,  need  to  know  violators  motivated  by 
moral  objectives  like  Montes  as  opposed  to  venge¬ 
ful  system  administrators)  and  measures  of  detection 
difficulty 

•  A  live  network  test  using  simulated  malicious  insiders 
modeled  on  known  and  projected  cases. 

•  Creation  of  an  eleven  million  record  data  set  of  hetero¬ 
geneous  cyber  events  including  physical  access 

(e.g.,  badge  logs),  host  access/administration  (e.g., 
password,  su,  login),  user/application  level  (e.g.,  web, 
mail,  network  news),  and  network  security  (e.g., 
StealthWatch,  snort). 

•  Real-time  detection  of  insider  Pal  (analyst  representing 
a  historical  need  to  know  violator),  Jill  (application  ad¬ 
ministrator)  and  Jack  (system  administrator  and  pro¬ 
jected  network  attacker)  exploiting  data  fusion  using  a 
carefully  selected  set  of  heterogeneous  sensors 

The  workshop  insider  cases  and  dataset  are  being  reused  by 
researchers  and  have  inspired  new  sensor  development. 
However,  while  this  research  makes  initial  contributions  to 
the  malicious  insider,  it  equally  raises  many  new  research 
directions.  These  include  the  need  for  more  refined  mali¬ 
cious  insider  models,  more  elaborate  cyber  ac¬ 
tions/observables  taxonomies,  more  comprehensive  test 
corpora,  and  more  sophisticated  detection  algorithms. 
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